12.06.2019

Workshop on digital workflows

Introduction

Results of our little survey:

Name Field Research
Henrik Bibliometrics Statistics, coding, prose

Topics du jour:

  • The modern research cycle
  • The Open Science movement
  • Incorporating workflow thinking into your research

Part I: The modern research project

An idealised research project

Requirements

  • Data management plan
  • Publication plan
  • Dissemination plan

Data management plan

  • What do you collect?
  • How do you treat it?
  • How will you keep/share it?

Publication plan

  • Where do you plan to publish?
  • What part of the project will make it into which publications?

The publishing cycle

The publishing cycle, really

Dissemination plan

  • How will you present your research?
  • In which channels?

Is this you?

Why all this stuff?

  • Crisis of confidence in science
    • “Replication crisis”

This is the whole abstract of an interesting paper in the field of genomic biology:

The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.

Ziemann, M., Y. Eren, A. El-Osta (2016) Gene name errors are widespread in the scientific literature. Genome Biology 17:177

Storytime

Here are some rows of some of the columns:

s4 s6 s7 s8 s9
4 4 1 NA 46
3 1 1 NA 125
3 1 1 NA 90
3 3 1 NA 156
4 5 1 NA 78
  • Only problem: I don’t know where I put the codebook!

Part II: How to deal with this?

Just don’t do it

The arrival of Open Science

[accountability, reproducibility, transparency]

Part III: Examples of digital workflows

Collaborating

  • From simple to

Keeping track

Documenting

Sharing

The trade-offs

  • There are powerful, efficient tools at our disposal
  • There is a learning curve of varying steepness
  • Maybe

Resources